Current Issue : January - March Volume : 2019 Issue Number : 1 Articles : 5 Articles
Deep learning is bringing breakthroughs to many computer vision subfields including\nOptical Music Recognition (OMR), which has seen a series of improvements to musical symbol\ndetection achieved by using generic deep learning models. However, so far, each such proposal has\nbeen based on a specific dataset and different evaluation criteria, which made it difficult to quantify\nthe new deep learning-based state-of-the-art and assess the relative merits of these detection models\non music scores. In this paper, a baseline for general detection of musical symbols with deep learning\nis presented. We consider three datasets of heterogeneous typology but with the same annotation\nformat, three neural models of different nature, and establish their performance in terms of a common\nevaluation standard. The experimental results confirm that the direct music object detection with\ndeep learning is indeed promising, but at the same time illustrates some of the domain-specific\nshortcomings of the general detectors. A qualitative comparison then suggests avenues for OMR\nimprovement, based both on properties of the detection model and how the datasets are defined.\nTo the best of our knowledge, this is the first time that competing music object detection systems from\nthe machine learning paradigm are directly compared to each other. We hope that this work will\nserve as a reference to measure the progress of future developments of OMR in music object detection....
A new architecture for melody extraction from polyphonic music is explored in this paper.\nSpecifically, chromagrams are first constructed through the harmonic pitch class profile (HPCP) to\nmeasure the salience of melody, and chroma-level notes are tracked by dynamic programming. Then,\nnote detection is performed according to chroma-level note differences between adjacent frames.\nNext, note pitches are coarsely mapped by maximizing the salience of each note, followed by a\nfine tuning to fit the dynamic variation within each note. Finally, voicing detection is carried out\nto determine the presence of melody according to the salience of fine-tuned notes. Note level pitch\nmapping and fine tuning avoids pitch shifting between different octaves or notes within one note\nduration. Several experiments have been conducted to evaluate the performance of the proposed\nmethod. The experimental results show that the proposed method can track the dynamic pitch\nchanging within each note, and performs well at different signal-to-accompaniment ratios. However,\nits performance for deep vibratos and pitch glides still needs to be improved....
Automatic audio announcement systems are widely used in public places such as transportation vehicles and facilities, hospitals, and\nbanks. However, these systems cannot be used by people with hearing impairment. That brings great inconvenience to their lives.\nIn this paper, an approach of audio announcement detection and recognition for the hearing-impaired people based on the smart\nphone is proposed and a mobile phone application (app) is developed, taking the bank as amajor applying scenario. Using the app,\nthe users can sign up alerts for their numbers and then the system begins to detect audio announcements using the microphone\non the smart phone. For each audio announcement detected, the speech within it is recognized and the text is displayed on the\nscreen of the phone. When the number the user input is announced, alert will be given by vibration. For audio announcement\ndetection, a method based on audio segment classification and postprocessing is proposed, which uses a SVM classifier trained\non audio announcements and environment noise collected in banks. For announcement speech recognition, an ASR engine is\ndeveloped using a GMM-HMM-based acoustic model and a finite state transducer (FST) based grammar. The acoustic model is\ntrained on audio announcement speech collected in banks, and the grammar is human-defined according to the patterns used\nby the automatic audio announcement systems. Experimental results show that character error rates (CERs) around 5% can be\nachieved for the announcement speech, which shows feasibility of the proposed method and system....
An extra-cochlear stimulation system has been investigated as a less invasive\nalternative to conventional cochlear implant; however, the system is used\nprimarily as a speech-reading aid. The purpose of this study was to develop a\nspeech encoding scheme for the extra-cochlear stimulation system to convey\nintelligible speech. A click-modulated speech sound (CMS) was created as a\nsimulation of the extra-cochlear stimulation system. The CMS is a repetitive\nclick with a repetition rate similar to the form ant frequency transition of an\noriginal sound. Seven native Japanese speakers with normal hearing participated\nin the experiment. After listening to the CMS, synthesized from low\nfamiliarity Japanese words, the subjects reported their perceptions. The results\nshowed that the rates of correctly identified vowels and consonants were\nsignificantly higher than those of the control stimulus, suggesting that the\nCMS can generate at least partially intelligible vowel and consonant perceptions.\nIn all, the speech encoding scheme could be applied to the extra-\ncochlear stimulation system to restore speech perception....
We investigated user experiences from 117 Finnish children aged between 8 and 12 years in a trial of an English language learning\nprogramme that used automatic speech recognition (ASR). We used measures that encompassed both affective reactions and\nquestions tapping into the children' sense of pedagogical utility. We also tested their perception of sound quality and compared\nreactions of game and nongame-based versions of the application. Results showed that children expressed higher affective ratings\nfor the game compared to nongame version of the application. Children also expressed a preference to play with a friend compared\nto playing alone or playing within a group.They found that assessment of their speech is useful although they did not necessarily\nenjoy hearing their own voices. The results are discussed in terms of the implications for user interface (UI) design in speech\nlearning applications for children....
Loading....